{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wSFbIMb87cHu"
   },
   "source": [
    "# **Bioinformatics Project - Computational Drug Discovery [Part 1] Download Bioactivity Data (Concised version)**\n",
    "\n",
    "Chanin Nantasenamat\n",
    "\n",
    "[*'Data Professor' YouTube channel*](http://youtube.com/dataprofessor)\n",
    "\n",
    "In this Jupyter notebook, we will be building a real-life **data science project** that you can include in your **data science portfolio**. Particularly, we will be building a machine learning model using the ChEMBL bioactivity data.\n",
    "\n",
    "In **Part 1**, we will be performing Data Collection and Pre-Processing from the ChEMBL Database.\n",
    "\n",
    "Note for this Concised Version:\n",
    "* Redundant code cells were deleted.\n",
    "* Code cells for saving files to Google Drive has been deleted.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3iQiERxumDor"
   },
   "source": [
    "## **ChEMBL Database**\n",
    "\n",
    "The [*ChEMBL Database*](https://www.ebi.ac.uk/chembl/) is a database that contains curated bioactivity data of more than 2 million compounds. It is compiled from more than 76,000 documents, 1.2 million assays and the data spans 13,000 targets and 1,800 cells and 33,000 indications.\n",
    "[Data as of March 25, 2020; ChEMBL version 26]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iryGAwAIQ4yf"
   },
   "source": [
    "## **Installing libraries**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "toGT1U_B7F2i"
   },
   "source": [
    "Install the ChEMBL web service package so that we can retrieve bioactivity data from the ChEMBL Database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "cJGExHQBfLh7",
    "outputId": "bd9910cf-c63d-49ef-baa9-1eb9b2a6b286"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting chembl_webresource_client\n",
      "  Downloading chembl_webresource_client-0.10.7-py3-none-any.whl (55 kB)\n",
      "\u001b[?25l\r",
      "\u001b[K     |██████                          | 10 kB 22.9 MB/s eta 0:00:01\r",
      "\u001b[K     |███████████▉                    | 20 kB 26.9 MB/s eta 0:00:01\r",
      "\u001b[K     |█████████████████▊              | 30 kB 12.2 MB/s eta 0:00:01\r",
      "\u001b[K     |███████████████████████▋        | 40 kB 9.3 MB/s eta 0:00:01\r",
      "\u001b[K     |█████████████████████████████▌  | 51 kB 5.2 MB/s eta 0:00:01\r",
      "\u001b[K     |████████████████████████████████| 55 kB 2.1 MB/s \n",
      "\u001b[?25hRequirement already satisfied: easydict in /usr/local/lib/python3.7/dist-packages (from chembl_webresource_client) (1.9)\n",
      "Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from chembl_webresource_client) (1.24.3)\n",
      "Collecting requests-cache~=0.7.0\n",
      "  Downloading requests_cache-0.7.5-py3-none-any.whl (39 kB)\n",
      "Requirement already satisfied: requests>=2.18.4 in /usr/local/lib/python3.7/dist-packages (from chembl_webresource_client) (2.23.0)\n",
      "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->chembl_webresource_client) (3.0.4)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->chembl_webresource_client) (2021.10.8)\n",
      "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->chembl_webresource_client) (2.10)\n",
      "Requirement already satisfied: attrs<22.0,>=21.2 in /usr/local/lib/python3.7/dist-packages (from requests-cache~=0.7.0->chembl_webresource_client) (21.2.0)\n",
      "Collecting url-normalize<2.0,>=1.4\n",
      "  Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB)\n",
      "Collecting pyyaml>=5.4\n",
      "  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)\n",
      "\u001b[K     |████████████████████████████████| 596 kB 10.7 MB/s \n",
      "\u001b[?25hCollecting itsdangerous>=2.0.1\n",
      "  Downloading itsdangerous-2.0.1-py3-none-any.whl (18 kB)\n",
      "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from url-normalize<2.0,>=1.4->requests-cache~=0.7.0->chembl_webresource_client) (1.15.0)\n",
      "Installing collected packages: url-normalize, pyyaml, itsdangerous, requests-cache, chembl-webresource-client\n",
      "  Attempting uninstall: pyyaml\n",
      "    Found existing installation: PyYAML 3.13\n",
      "    Uninstalling PyYAML-3.13:\n",
      "      Successfully uninstalled PyYAML-3.13\n",
      "  Attempting uninstall: itsdangerous\n",
      "    Found existing installation: itsdangerous 1.1.0\n",
      "    Uninstalling itsdangerous-1.1.0:\n",
      "      Successfully uninstalled itsdangerous-1.1.0\n",
      "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "flask 1.1.4 requires itsdangerous<2.0,>=0.24, but you have itsdangerous 2.0.1 which is incompatible.\u001b[0m\n",
      "Successfully installed chembl-webresource-client-0.10.7 itsdangerous-2.0.1 pyyaml-6.0 requests-cache-0.7.5 url-normalize-1.4.3\n"
     ]
    }
   ],
   "source": [
    "! pip install chembl_webresource_client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "J0kJjL8gb5nX"
   },
   "source": [
    "## **Importing libraries**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "RXoCvMPPfNrv"
   },
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "import pandas as pd\n",
    "from chembl_webresource_client.new_client import new_client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1FgUai1bfigC"
   },
   "source": [
    "## **Search for Target protein**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7lBsDrD0gAqH"
   },
   "source": [
    "### **Target search for coronavirus**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 195
    },
    "id": "Vxtp79so4ZjF",
    "outputId": "278b1c9d-1549-414e-9cfd-6e458e0650ef"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>cross_references</th>\n",
       "      <th>organism</th>\n",
       "      <th>pref_name</th>\n",
       "      <th>score</th>\n",
       "      <th>species_group_flag</th>\n",
       "      <th>target_chembl_id</th>\n",
       "      <th>target_components</th>\n",
       "      <th>target_type</th>\n",
       "      <th>tax_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[{'xref_id': 'Q61214', 'xref_name': None, 'xre...</td>\n",
       "      <td>Mus musculus</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>13.0</td>\n",
       "      <td>False</td>\n",
       "      <td>CHEMBL4750</td>\n",
       "      <td>[{'accession': 'Q61214', 'component_descriptio...</td>\n",
       "      <td>SINGLE PROTEIN</td>\n",
       "      <td>10090</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[{'xref_id': 'Q63470', 'xref_name': None, 'xre...</td>\n",
       "      <td>Rattus norvegicus</td>\n",
       "      <td>Dual specificity tyrosine-phosphorylation-regu...</td>\n",
       "      <td>13.0</td>\n",
       "      <td>False</td>\n",
       "      <td>CHEMBL5508</td>\n",
       "      <td>[{'accession': 'Q63470', 'component_descriptio...</td>\n",
       "      <td>SINGLE PROTEIN</td>\n",
       "      <td>10116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[{'xref_id': 'Q13627', 'xref_name': None, 'xre...</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>12.0</td>\n",
       "      <td>False</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>[{'accession': 'Q13627', 'component_descriptio...</td>\n",
       "      <td>SINGLE PROTEIN</td>\n",
       "      <td>9606</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    cross_references  ... tax_id\n",
       "0  [{'xref_id': 'Q61214', 'xref_name': None, 'xre...  ...  10090\n",
       "1  [{'xref_id': 'Q63470', 'xref_name': None, 'xre...  ...  10116\n",
       "2  [{'xref_id': 'Q13627', 'xref_name': None, 'xre...  ...   9606\n",
       "\n",
       "[3 rows x 9 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Target search for coronavirus\n",
    "target = new_client.target\n",
    "target_query = target.search('sars-cov-2')\n",
    "targets = pd.DataFrame.from_dict(target_query)\n",
    "targets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Y5OPfEALjAfZ"
   },
   "source": [
    "### **Select and retrieve bioactivity data for *SARS coronavirus 3C-like proteinase* (fifth entry)**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gSQ3aroOgML7"
   },
   "source": [
    "We will assign the fifth entry (which corresponds to the target protein, *coronavirus 3C-like proteinase*) to the ***selected_target*** variable "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "id": "StrcHMVLha7u",
    "outputId": "009f0ba3-7d5f-4df3-f9e7-52afc944b33e"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'CHEMBL2292'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "selected_target = targets.target_chembl_id[2]\n",
    "selected_target"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GWd2DRalgjzB"
   },
   "source": [
    "Here, we will retrieve only bioactivity data for *coronavirus 3C-like proteinase* (CHEMBL3927) that are reported as IC$_{50}$ values in nM (nanomolar) unit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "LeFbV_CsSP8D"
   },
   "outputs": [],
   "source": [
    "activity = new_client.activity\n",
    "res = activity.filter(target_chembl_id=selected_target).filter(standard_type=\"IC50\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "RC4T-NEmSWV-"
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame.from_dict(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 963
    },
    "id": "lxCsx5bFnd67",
    "outputId": "f7ecda02-fb34-4a8c-d06c-f3faf82e51f7"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>activity_comment</th>\n",
       "      <th>activity_id</th>\n",
       "      <th>activity_properties</th>\n",
       "      <th>assay_chembl_id</th>\n",
       "      <th>assay_description</th>\n",
       "      <th>assay_type</th>\n",
       "      <th>assay_variant_accession</th>\n",
       "      <th>assay_variant_mutation</th>\n",
       "      <th>bao_endpoint</th>\n",
       "      <th>bao_format</th>\n",
       "      <th>bao_label</th>\n",
       "      <th>canonical_smiles</th>\n",
       "      <th>data_validity_comment</th>\n",
       "      <th>data_validity_description</th>\n",
       "      <th>document_chembl_id</th>\n",
       "      <th>document_journal</th>\n",
       "      <th>document_year</th>\n",
       "      <th>ligand_efficiency</th>\n",
       "      <th>molecule_chembl_id</th>\n",
       "      <th>molecule_pref_name</th>\n",
       "      <th>parent_molecule_chembl_id</th>\n",
       "      <th>pchembl_value</th>\n",
       "      <th>potential_duplicate</th>\n",
       "      <th>qudt_units</th>\n",
       "      <th>record_id</th>\n",
       "      <th>relation</th>\n",
       "      <th>src_id</th>\n",
       "      <th>standard_flag</th>\n",
       "      <th>standard_relation</th>\n",
       "      <th>standard_text_value</th>\n",
       "      <th>standard_type</th>\n",
       "      <th>standard_units</th>\n",
       "      <th>standard_upper_value</th>\n",
       "      <th>standard_value</th>\n",
       "      <th>target_chembl_id</th>\n",
       "      <th>target_organism</th>\n",
       "      <th>target_pref_name</th>\n",
       "      <th>target_tax_id</th>\n",
       "      <th>text_value</th>\n",
       "      <th>toid</th>\n",
       "      <th>type</th>\n",
       "      <th>units</th>\n",
       "      <th>uo_units</th>\n",
       "      <th>upper_value</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>None</td>\n",
       "      <td>1476106</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL828004</td>\n",
       "      <td>Inhibitory concentration against selected kina...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cnccc21</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1139578</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2005</td>\n",
       "      <td>{'bei': '30.22', 'le': '0.56', 'lle': '5.88', ...</td>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>6.96</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>382064</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>110.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>110.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>None</td>\n",
       "      <td>1476649</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL828004</td>\n",
       "      <td>Inhibitory concentration against selected kina...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cncc(CNC3CCNCC3)c21</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1139578</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2005</td>\n",
       "      <td>{'bei': '15.51', 'le': '0.29', 'lle': '4.39', ...</td>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>5.31</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>382011</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>4900.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>4900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>None</td>\n",
       "      <td>1701840</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL861010</td>\n",
       "      <td>Inhibition of human recombinant DYRK1a</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>O=c1oc2c(O)c(O)cc3c(=O)oc4c(O)c(O)cc1c4c23</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1146093</td>\n",
       "      <td>J. Med. Chem.</td>\n",
       "      <td>2006</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL6246</td>\n",
       "      <td>ELLAGIC ACID</td>\n",
       "      <td>CHEMBL6246</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>448549</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>40000.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>uM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>None</td>\n",
       "      <td>1751439</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL869157</td>\n",
       "      <td>Inhibition of DYRK1a</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CC(C)c1nnc2ccc(-c3ocnc3-c3ccc(F)cc3Cl)cn12</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1145312</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2006</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL215652</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL215652</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>528643</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>uM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>None</td>\n",
       "      <td>1751440</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL869157</td>\n",
       "      <td>Inhibition of DYRK1a</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CC(C)c1nnc2ccc(-c3ocnc3-c3cc(F)c(F)cc3F)cn12</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1145312</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2006</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL213423</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL213423</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>528646</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>uM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1156</th>\n",
       "      <td>Slightly Active</td>\n",
       "      <td>20739000</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4511049</td>\n",
       "      <td>In vitro kinase assay (DYRK1A)</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CN1CCN(C(=O)C(C)(C)c2ccc(C(=O)Nc3cn4cc(-c5ccnc...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4507307</td>\n",
       "      <td>None</td>\n",
       "      <td>2021</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4531334</td>\n",
       "      <td>T3-CLK</td>\n",
       "      <td>CHEMBL4531334</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3359743</td>\n",
       "      <td>None</td>\n",
       "      <td>54</td>\n",
       "      <td>True</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>260.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>260.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1157</th>\n",
       "      <td>Slightly Active</td>\n",
       "      <td>20739005</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4511054</td>\n",
       "      <td>NanoBRET (SGC Frankfurt) (DYRK1A)</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CN1CCN(C(=O)C(C)(C)c2ccc(C(=O)Nc3cn4cc(-c5ccnc...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4507307</td>\n",
       "      <td>None</td>\n",
       "      <td>2021</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4531334</td>\n",
       "      <td>T3-CLK</td>\n",
       "      <td>CHEMBL4531334</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3359743</td>\n",
       "      <td>None</td>\n",
       "      <td>54</td>\n",
       "      <td>True</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>32.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>32.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1158</th>\n",
       "      <td>Not Active</td>\n",
       "      <td>20739013</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4511054</td>\n",
       "      <td>NanoBRET (SGC Frankfurt) (DYRK1A)</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CN1CCN(C(=O)C(C)(C)c2ccc(C(=O)Nc3cn4cc(-c5cc(C...</td>\n",
       "      <td>Non standard unit for type</td>\n",
       "      <td>Units for this activity type are unusual and m...</td>\n",
       "      <td>CHEMBL4507307</td>\n",
       "      <td>None</td>\n",
       "      <td>2021</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4576555</td>\n",
       "      <td>T3-CLK-N</td>\n",
       "      <td>CHEMBL4576555</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>None</td>\n",
       "      <td>3359744</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>54</td>\n",
       "      <td>False</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>µM</td>\n",
       "      <td>None</td>\n",
       "      <td>10.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>µM</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1159</th>\n",
       "      <td>None</td>\n",
       "      <td>20764795</td>\n",
       "      <td>[{'comments': None, 'relation': None, 'result_...</td>\n",
       "      <td>CHEMBL4512209</td>\n",
       "      <td>DYRK1A(DY1ALGP1) Takeda global kinase panel</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CN1C(=O)[C@@H](N2CCc3c(nn(Cc4ccccc4)c3Br)C2=O)...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4507326</td>\n",
       "      <td>None</td>\n",
       "      <td>2021</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4549667</td>\n",
       "      <td>TP-030-2</td>\n",
       "      <td>CHEMBL4549667</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3359780</td>\n",
       "      <td>None</td>\n",
       "      <td>54</td>\n",
       "      <td>True</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>pIC50</td>\n",
       "      <td>None</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1160</th>\n",
       "      <td>None</td>\n",
       "      <td>20765156</td>\n",
       "      <td>[{'comments': None, 'relation': None, 'result_...</td>\n",
       "      <td>CHEMBL4512783</td>\n",
       "      <td>DYRK1A(DY1ALGP1) Takeda global kinase panel</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CN1C(=O)[C@@H](N2CCc3cn(CC4CCS(=O)(=O)CC4)nc3C...</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4507327</td>\n",
       "      <td>None</td>\n",
       "      <td>2021</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4097778</td>\n",
       "      <td>TP-030n</td>\n",
       "      <td>CHEMBL4097778</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3489319</td>\n",
       "      <td>None</td>\n",
       "      <td>54</td>\n",
       "      <td>True</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>pIC50</td>\n",
       "      <td>None</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1161 rows × 45 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     activity_comment  activity_id  ... upper_value   value\n",
       "0                None      1476106  ...        None   110.0\n",
       "1                None      1476649  ...        None  4900.0\n",
       "2                None      1701840  ...        None    40.0\n",
       "3                None      1751439  ...        None    10.0\n",
       "4                None      1751440  ...        None    10.0\n",
       "...               ...          ...  ...         ...     ...\n",
       "1156  Slightly Active     20739000  ...        None   260.0\n",
       "1157  Slightly Active     20739005  ...        None    32.0\n",
       "1158       Not Active     20739013  ...        None    10.0\n",
       "1159             None     20764795  ...        None     6.0\n",
       "1160             None     20765156  ...        None     6.0\n",
       "\n",
       "[1161 rows x 45 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 319
    },
    "id": "s9iUAXFdSkoM",
    "outputId": "72fa5145-358b-4927-ae62-139afa2d1e0c"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>activity_comment</th>\n",
       "      <th>activity_id</th>\n",
       "      <th>activity_properties</th>\n",
       "      <th>assay_chembl_id</th>\n",
       "      <th>assay_description</th>\n",
       "      <th>assay_type</th>\n",
       "      <th>assay_variant_accession</th>\n",
       "      <th>assay_variant_mutation</th>\n",
       "      <th>bao_endpoint</th>\n",
       "      <th>bao_format</th>\n",
       "      <th>bao_label</th>\n",
       "      <th>canonical_smiles</th>\n",
       "      <th>data_validity_comment</th>\n",
       "      <th>data_validity_description</th>\n",
       "      <th>document_chembl_id</th>\n",
       "      <th>document_journal</th>\n",
       "      <th>document_year</th>\n",
       "      <th>ligand_efficiency</th>\n",
       "      <th>molecule_chembl_id</th>\n",
       "      <th>molecule_pref_name</th>\n",
       "      <th>parent_molecule_chembl_id</th>\n",
       "      <th>pchembl_value</th>\n",
       "      <th>potential_duplicate</th>\n",
       "      <th>qudt_units</th>\n",
       "      <th>record_id</th>\n",
       "      <th>relation</th>\n",
       "      <th>src_id</th>\n",
       "      <th>standard_flag</th>\n",
       "      <th>standard_relation</th>\n",
       "      <th>standard_text_value</th>\n",
       "      <th>standard_type</th>\n",
       "      <th>standard_units</th>\n",
       "      <th>standard_upper_value</th>\n",
       "      <th>standard_value</th>\n",
       "      <th>target_chembl_id</th>\n",
       "      <th>target_organism</th>\n",
       "      <th>target_pref_name</th>\n",
       "      <th>target_tax_id</th>\n",
       "      <th>text_value</th>\n",
       "      <th>toid</th>\n",
       "      <th>type</th>\n",
       "      <th>units</th>\n",
       "      <th>uo_units</th>\n",
       "      <th>upper_value</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>None</td>\n",
       "      <td>1476106</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL828004</td>\n",
       "      <td>Inhibitory concentration against selected kina...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cnccc21</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1139578</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2005</td>\n",
       "      <td>{'bei': '30.22', 'le': '0.56', 'lle': '5.88', ...</td>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>6.96</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>382064</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>110.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>110.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>None</td>\n",
       "      <td>1476649</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL828004</td>\n",
       "      <td>Inhibitory concentration against selected kina...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cncc(CNC3CCNCC3)c21</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1139578</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2005</td>\n",
       "      <td>{'bei': '15.51', 'le': '0.29', 'lle': '4.39', ...</td>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>5.31</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>382011</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>4900.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>4900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>None</td>\n",
       "      <td>1701840</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL861010</td>\n",
       "      <td>Inhibition of human recombinant DYRK1a</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>O=c1oc2c(O)c(O)cc3c(=O)oc4c(O)c(O)cc1c4c23</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1146093</td>\n",
       "      <td>J. Med. Chem.</td>\n",
       "      <td>2006</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL6246</td>\n",
       "      <td>ELLAGIC ACID</td>\n",
       "      <td>CHEMBL6246</td>\n",
       "      <td>None</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>448549</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>&gt;</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>40000.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>uM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  activity_comment  activity_id  ... upper_value   value\n",
       "0             None      1476106  ...        None   110.0\n",
       "1             None      1476649  ...        None  4900.0\n",
       "2             None      1701840  ...        None    40.0\n",
       "\n",
       "[3 rows x 45 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fQ78N26Fg15T"
   },
   "source": [
    "Finally we will save the resulting bioactivity data to a CSV file **bioactivity_data.csv**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "ZvUUEIVxTOH1"
   },
   "outputs": [],
   "source": [
    "df.to_csv('Dyrk1a_bioactivity_data_raw.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_GXMpFNUOn_8"
   },
   "source": [
    "## **Handling missing data**\n",
    "If any compounds has missing value for the **standard_value** column then drop it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 963
    },
    "id": "hkVOdk6ZR396",
    "outputId": "4baebe35-2fff-4823-c322-615ade411811"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>activity_comment</th>\n",
       "      <th>activity_id</th>\n",
       "      <th>activity_properties</th>\n",
       "      <th>assay_chembl_id</th>\n",
       "      <th>assay_description</th>\n",
       "      <th>assay_type</th>\n",
       "      <th>assay_variant_accession</th>\n",
       "      <th>assay_variant_mutation</th>\n",
       "      <th>bao_endpoint</th>\n",
       "      <th>bao_format</th>\n",
       "      <th>bao_label</th>\n",
       "      <th>canonical_smiles</th>\n",
       "      <th>data_validity_comment</th>\n",
       "      <th>data_validity_description</th>\n",
       "      <th>document_chembl_id</th>\n",
       "      <th>document_journal</th>\n",
       "      <th>document_year</th>\n",
       "      <th>ligand_efficiency</th>\n",
       "      <th>molecule_chembl_id</th>\n",
       "      <th>molecule_pref_name</th>\n",
       "      <th>parent_molecule_chembl_id</th>\n",
       "      <th>pchembl_value</th>\n",
       "      <th>potential_duplicate</th>\n",
       "      <th>qudt_units</th>\n",
       "      <th>record_id</th>\n",
       "      <th>relation</th>\n",
       "      <th>src_id</th>\n",
       "      <th>standard_flag</th>\n",
       "      <th>standard_relation</th>\n",
       "      <th>standard_text_value</th>\n",
       "      <th>standard_type</th>\n",
       "      <th>standard_units</th>\n",
       "      <th>standard_upper_value</th>\n",
       "      <th>standard_value</th>\n",
       "      <th>target_chembl_id</th>\n",
       "      <th>target_organism</th>\n",
       "      <th>target_pref_name</th>\n",
       "      <th>target_tax_id</th>\n",
       "      <th>text_value</th>\n",
       "      <th>toid</th>\n",
       "      <th>type</th>\n",
       "      <th>units</th>\n",
       "      <th>uo_units</th>\n",
       "      <th>upper_value</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>None</td>\n",
       "      <td>1476106</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL828004</td>\n",
       "      <td>Inhibitory concentration against selected kina...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cnccc21</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1139578</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2005</td>\n",
       "      <td>{'bei': '30.22', 'le': '0.56', 'lle': '5.88', ...</td>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>6.96</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>382064</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>110.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>110.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>None</td>\n",
       "      <td>1476649</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL828004</td>\n",
       "      <td>Inhibitory concentration against selected kina...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cncc(CNC3CCNCC3)c21</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1139578</td>\n",
       "      <td>Bioorg. Med. Chem. Lett.</td>\n",
       "      <td>2005</td>\n",
       "      <td>{'bei': '15.51', 'le': '0.29', 'lle': '4.39', ...</td>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>5.31</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>382011</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>4900.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>4900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>None</td>\n",
       "      <td>1837933</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL920482</td>\n",
       "      <td>Inhibition of DYRK1A</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CN(C)c1nc2c(Br)c(Br)c(Br)c(Br)c2[nH]1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1149250</td>\n",
       "      <td>J. Med. Chem.</td>\n",
       "      <td>2004</td>\n",
       "      <td>{'bei': '14.52', 'le': '0.59', 'lle': '2.24', ...</td>\n",
       "      <td>CHEMBL376505</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL376505</td>\n",
       "      <td>6.92</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>629634</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>120.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>uM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>0.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>None</td>\n",
       "      <td>2137228</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL936939</td>\n",
       "      <td>Inhibition of DYRK1a</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>NS(=O)(=O)c1ccc(Nc2nc(OCC3CCCCC3)c3nc[nH]c3n2)cc1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1145498</td>\n",
       "      <td>Proc. Natl. Acad. Sci. U.S.A.</td>\n",
       "      <td>2007</td>\n",
       "      <td>{'bei': '15.02', 'le': '0.29', 'lle': '3.35', ...</td>\n",
       "      <td>CHEMBL319467</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL319467</td>\n",
       "      <td>6.05</td>\n",
       "      <td>True</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>705222</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>900.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>None</td>\n",
       "      <td>2211669</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL980537</td>\n",
       "      <td>Inhibition of human recombinant DYRK1A</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000357</td>\n",
       "      <td>single protein format</td>\n",
       "      <td>CC[C@H](CO)Nc1nc(NCc2ccccc2)c2ncn(C(C)C)c2n1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL1149828</td>\n",
       "      <td>J. Nat. Prod.</td>\n",
       "      <td>2008</td>\n",
       "      <td>{'bei': '13.39', 'le': '0.25', 'lle': '1.54', ...</td>\n",
       "      <td>CHEMBL14762</td>\n",
       "      <td>SELICICLIB</td>\n",
       "      <td>CHEMBL14762</td>\n",
       "      <td>4.75</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>742340</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>18000.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>uM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1150</th>\n",
       "      <td>None</td>\n",
       "      <td>20680054</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4622607</td>\n",
       "      <td>Inhibition of recombinant human N-terminal Hex...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000019</td>\n",
       "      <td>assay format</td>\n",
       "      <td>COc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4619818</td>\n",
       "      <td>ACS Med Chem Lett</td>\n",
       "      <td>2020</td>\n",
       "      <td>{'bei': '24.88', 'le': '0.45', 'lle': '4.98', ...</td>\n",
       "      <td>CHEMBL187081</td>\n",
       "      <td>PYRAZOLOPYRIDAZINE 1</td>\n",
       "      <td>CHEMBL187081</td>\n",
       "      <td>7.92</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3482667</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>12.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1151</th>\n",
       "      <td>None</td>\n",
       "      <td>20680055</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4622607</td>\n",
       "      <td>Inhibition of recombinant human N-terminal Hex...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000019</td>\n",
       "      <td>assay format</td>\n",
       "      <td>N#Cc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4619818</td>\n",
       "      <td>ACS Med Chem Lett</td>\n",
       "      <td>2020</td>\n",
       "      <td>{'bei': '26.49', 'le': '0.47', 'lle': '5.50', ...</td>\n",
       "      <td>CHEMBL495696</td>\n",
       "      <td>GW778894X</td>\n",
       "      <td>CHEMBL495696</td>\n",
       "      <td>8.30</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3482668</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>5.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1152</th>\n",
       "      <td>None</td>\n",
       "      <td>20680056</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4622607</td>\n",
       "      <td>Inhibition of recombinant human N-terminal Hex...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000019</td>\n",
       "      <td>assay format</td>\n",
       "      <td>N#Cc1ccc(Nc2nccc(-c3cnn4ncccc34)n2)cc1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4619818</td>\n",
       "      <td>ACS Med Chem Lett</td>\n",
       "      <td>2020</td>\n",
       "      <td>{'bei': '22.30', 'le': '0.40', 'lle': '4.19', ...</td>\n",
       "      <td>CHEMBL359794</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL359794</td>\n",
       "      <td>6.99</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3482669</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>103.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>103.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1153</th>\n",
       "      <td>None</td>\n",
       "      <td>20680057</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4622607</td>\n",
       "      <td>Inhibition of recombinant human N-terminal Hex...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000019</td>\n",
       "      <td>assay format</td>\n",
       "      <td>Cc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4619818</td>\n",
       "      <td>ACS Med Chem Lett</td>\n",
       "      <td>2020</td>\n",
       "      <td>{'bei': '22.50', 'le': '0.41', 'lle': '3.86', ...</td>\n",
       "      <td>CHEMBL4637319</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4637319</td>\n",
       "      <td>7.12</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3482670</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>76.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>76.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1154</th>\n",
       "      <td>None</td>\n",
       "      <td>20680058</td>\n",
       "      <td>[]</td>\n",
       "      <td>CHEMBL4622607</td>\n",
       "      <td>Inhibition of recombinant human N-terminal Hex...</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>BAO_0000190</td>\n",
       "      <td>BAO_0000019</td>\n",
       "      <td>assay format</td>\n",
       "      <td>COc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4619818</td>\n",
       "      <td>ACS Med Chem Lett</td>\n",
       "      <td>2020</td>\n",
       "      <td>{'bei': '21.10', 'le': '0.38', 'lle': '4.05', ...</td>\n",
       "      <td>CHEMBL4636572</td>\n",
       "      <td>None</td>\n",
       "      <td>CHEMBL4636572</td>\n",
       "      <td>7.01</td>\n",
       "      <td>False</td>\n",
       "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
       "      <td>3482671</td>\n",
       "      <td>=</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "      <td>=</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>None</td>\n",
       "      <td>97.0</td>\n",
       "      <td>CHEMBL2292</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
       "      <td>9606</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>IC50</td>\n",
       "      <td>nM</td>\n",
       "      <td>UO_0000065</td>\n",
       "      <td>None</td>\n",
       "      <td>97.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>842 rows × 45 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     activity_comment  activity_id  ... upper_value   value\n",
       "0                None      1476106  ...        None   110.0\n",
       "1                None      1476649  ...        None  4900.0\n",
       "10               None      1837933  ...        None    0.12\n",
       "11               None      2137228  ...        None   900.0\n",
       "14               None      2211669  ...        None    18.0\n",
       "...               ...          ...  ...         ...     ...\n",
       "1150             None     20680054  ...        None    12.0\n",
       "1151             None     20680055  ...        None     5.0\n",
       "1152             None     20680056  ...        None   103.0\n",
       "1153             None     20680057  ...        None    76.0\n",
       "1154             None     20680058  ...        None    97.0\n",
       "\n",
       "[842 rows x 45 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = df[df.pchembl_value.notna()]\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Y-qNsUlmjS25"
   },
   "source": [
    "Apparently, for this dataset there is no missing data. But we can use the above code cell for bioactivity data of other target protein."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5H4sSFAWhV9B"
   },
   "source": [
    "## **Data pre-processing of the bioactivity data**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tO22XVlzhkXR"
   },
   "source": [
    "### **Labeling compounds as either being active, inactive or intermediate**\n",
    "The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be **active** while those greater than 10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "1E8rz7oMOd-5"
   },
   "outputs": [],
   "source": [
    "bioactivity_class = []\n",
    "for i in df2.standard_value:\n",
    "  if float(i) >= 10000:\n",
    "    bioactivity_class.append(\"inactive\")\n",
    "  elif float(i) <= 1000:\n",
    "    bioactivity_class.append(\"active\")\n",
    "  else:\n",
    "    bioactivity_class.append(\"intermediate\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Nv2dzid_hzKd"
   },
   "source": [
    "### **Combine the 3 columns (molecule_chembl_id,canonical_smiles,standard_value) and bioactivity_class into a DataFrame**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 423
    },
    "id": "0NCLYmrASgha",
    "outputId": "ebe89776-7f65-4623-f641-7890a00bb31b"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>molecule_chembl_id</th>\n",
       "      <th>canonical_smiles</th>\n",
       "      <th>standard_value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cnccc21</td>\n",
       "      <td>110.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cncc(CNC3CCNCC3)c21</td>\n",
       "      <td>4900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>CHEMBL376505</td>\n",
       "      <td>CN(C)c1nc2c(Br)c(Br)c(Br)c(Br)c2[nH]1</td>\n",
       "      <td>120.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>CHEMBL319467</td>\n",
       "      <td>NS(=O)(=O)c1ccc(Nc2nc(OCC3CCCCC3)c3nc[nH]c3n2)cc1</td>\n",
       "      <td>900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>CHEMBL14762</td>\n",
       "      <td>CC[C@H](CO)Nc1nc(NCc2ccccc2)c2ncn(C(C)C)c2n1</td>\n",
       "      <td>18000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1150</th>\n",
       "      <td>CHEMBL187081</td>\n",
       "      <td>COc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1151</th>\n",
       "      <td>CHEMBL495696</td>\n",
       "      <td>N#Cc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1152</th>\n",
       "      <td>CHEMBL359794</td>\n",
       "      <td>N#Cc1ccc(Nc2nccc(-c3cnn4ncccc34)n2)cc1</td>\n",
       "      <td>103.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1153</th>\n",
       "      <td>CHEMBL4637319</td>\n",
       "      <td>Cc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>76.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1154</th>\n",
       "      <td>CHEMBL4636572</td>\n",
       "      <td>COc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>97.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>842 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     molecule_chembl_id  ... standard_value\n",
       "0          CHEMBL189657  ...          110.0\n",
       "1          CHEMBL188434  ...         4900.0\n",
       "10         CHEMBL376505  ...          120.0\n",
       "11         CHEMBL319467  ...          900.0\n",
       "14          CHEMBL14762  ...        18000.0\n",
       "...                 ...  ...            ...\n",
       "1150       CHEMBL187081  ...           12.0\n",
       "1151       CHEMBL495696  ...            5.0\n",
       "1152       CHEMBL359794  ...          103.0\n",
       "1153      CHEMBL4637319  ...           76.0\n",
       "1154      CHEMBL4636572  ...           97.0\n",
       "\n",
       "[842 rows x 3 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "selection = ['molecule_chembl_id','canonical_smiles','standard_value']\n",
    "df3 = df2[selection]\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yz0ae4mNnnr6"
   },
   "source": [
    "# In case there is a mismatch in numbering, we use a temperary file to get rid of this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "9W4jf7srzfKa"
   },
   "outputs": [],
   "source": [
    "df3.to_csv('temp.csv',index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "9sSfFoFIzbkP"
   },
   "outputs": [],
   "source": [
    "df_temp = pd.read_csv('temp.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 423
    },
    "id": "Soyv4f22zwMm",
    "outputId": "9af4df15-9638-48d5-e2c0-f5d1ab255e94"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>molecule_chembl_id</th>\n",
       "      <th>canonical_smiles</th>\n",
       "      <th>standard_value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cnccc21</td>\n",
       "      <td>110.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cncc(CNC3CCNCC3)c21</td>\n",
       "      <td>4900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CHEMBL376505</td>\n",
       "      <td>CN(C)c1nc2c(Br)c(Br)c(Br)c(Br)c2[nH]1</td>\n",
       "      <td>120.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>CHEMBL319467</td>\n",
       "      <td>NS(=O)(=O)c1ccc(Nc2nc(OCC3CCCCC3)c3nc[nH]c3n2)cc1</td>\n",
       "      <td>900.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CHEMBL14762</td>\n",
       "      <td>CC[C@H](CO)Nc1nc(NCc2ccccc2)c2ncn(C(C)C)c2n1</td>\n",
       "      <td>18000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>837</th>\n",
       "      <td>CHEMBL187081</td>\n",
       "      <td>COc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>838</th>\n",
       "      <td>CHEMBL495696</td>\n",
       "      <td>N#Cc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>839</th>\n",
       "      <td>CHEMBL359794</td>\n",
       "      <td>N#Cc1ccc(Nc2nccc(-c3cnn4ncccc34)n2)cc1</td>\n",
       "      <td>103.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>840</th>\n",
       "      <td>CHEMBL4637319</td>\n",
       "      <td>Cc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>76.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>841</th>\n",
       "      <td>CHEMBL4636572</td>\n",
       "      <td>COc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>97.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>842 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    molecule_chembl_id  ... standard_value\n",
       "0         CHEMBL189657  ...          110.0\n",
       "1         CHEMBL188434  ...         4900.0\n",
       "2         CHEMBL376505  ...          120.0\n",
       "3         CHEMBL319467  ...          900.0\n",
       "4          CHEMBL14762  ...        18000.0\n",
       "..                 ...  ...            ...\n",
       "837       CHEMBL187081  ...           12.0\n",
       "838       CHEMBL495696  ...            5.0\n",
       "839       CHEMBL359794  ...          103.0\n",
       "840      CHEMBL4637319  ...           76.0\n",
       "841      CHEMBL4636572  ...           97.0\n",
       "\n",
       "[842 rows x 3 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_temp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "id": "HWMtx5J8z2Wq"
   },
   "outputs": [],
   "source": [
    "df3 = df_temp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 423
    },
    "id": "Li64nUiZQ-y2",
    "outputId": "f6d5c244-e0db-49bf-a3e8-51c47ef7ee78"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>molecule_chembl_id</th>\n",
       "      <th>canonical_smiles</th>\n",
       "      <th>standard_value</th>\n",
       "      <th>bioactivity_class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CHEMBL189657</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cnccc21</td>\n",
       "      <td>110.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CHEMBL188434</td>\n",
       "      <td>CCn1c(-c2nonc2N)nc2cncc(CNC3CCNCC3)c21</td>\n",
       "      <td>4900.0</td>\n",
       "      <td>intermediate</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CHEMBL376505</td>\n",
       "      <td>CN(C)c1nc2c(Br)c(Br)c(Br)c(Br)c2[nH]1</td>\n",
       "      <td>120.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>CHEMBL319467</td>\n",
       "      <td>NS(=O)(=O)c1ccc(Nc2nc(OCC3CCCCC3)c3nc[nH]c3n2)cc1</td>\n",
       "      <td>900.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CHEMBL14762</td>\n",
       "      <td>CC[C@H](CO)Nc1nc(NCc2ccccc2)c2ncn(C(C)C)c2n1</td>\n",
       "      <td>18000.0</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>837</th>\n",
       "      <td>CHEMBL187081</td>\n",
       "      <td>COc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>12.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>838</th>\n",
       "      <td>CHEMBL495696</td>\n",
       "      <td>N#Cc1cccc(Nc2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>5.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>839</th>\n",
       "      <td>CHEMBL359794</td>\n",
       "      <td>N#Cc1ccc(Nc2nccc(-c3cnn4ncccc34)n2)cc1</td>\n",
       "      <td>103.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>840</th>\n",
       "      <td>CHEMBL4637319</td>\n",
       "      <td>Cc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>76.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>841</th>\n",
       "      <td>CHEMBL4636572</td>\n",
       "      <td>COc1cccc(N(C)c2nccc(-c3cnn4ncccc34)n2)c1</td>\n",
       "      <td>97.0</td>\n",
       "      <td>active</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>842 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    molecule_chembl_id  ... bioactivity_class\n",
       "0         CHEMBL189657  ...            active\n",
       "1         CHEMBL188434  ...      intermediate\n",
       "2         CHEMBL376505  ...            active\n",
       "3         CHEMBL319467  ...            active\n",
       "4          CHEMBL14762  ...          inactive\n",
       "..                 ...  ...               ...\n",
       "837       CHEMBL187081  ...            active\n",
       "838       CHEMBL495696  ...            active\n",
       "839       CHEMBL359794  ...            active\n",
       "840      CHEMBL4637319  ...            active\n",
       "841      CHEMBL4636572  ...            active\n",
       "\n",
       "[842 rows x 4 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bioactivity_class = pd.Series(bioactivity_class, name='bioactivity_class')\n",
    "df4 = pd.concat([df3, bioactivity_class], axis=1)\n",
    "df4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9tlgyexWh7YJ"
   },
   "source": [
    "Saves dataframe to CSV file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "id": "nSNia7suXstR"
   },
   "outputs": [],
   "source": [
    "df4.to_csv('Dyrk1a_bioactivity_data_preprocessed.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "UuZf5-MEd-H5",
    "outputId": "40f68e99-6439-4b0a-b8ca-e977900be1f6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 804\n",
      "-rw-r--r-- 1 root root  60892 Nov 26 05:02 Dyrk1a_bioactivity_data_preprocessed.csv\n",
      "-rw-r--r-- 1 root root 699557 Nov 26 04:58 Dyrk1a_bioactivity_data_raw.csv\n",
      "drwxr-xr-x 1 root root   4096 Nov 18 14:36 sample_data\n",
      "-rw-r--r-- 1 root root  53434 Nov 26 05:00 temp.csv\n"
     ]
    }
   ],
   "source": [
    "! ls -l"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZywB5K_Dlawb"
   },
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "CDD-ML-Part-1-Bioactivity-Data-Concised.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
